Multiple-objective portfolio optimization¶

INTRODUCTION¶

Your task is to solve a multiple-objective portfolio optimization problem.

  • Use the basic Markowitz's model from 1952 (see Lecture 1)
  • Solve = construct Pareto front approximations.
  • The dataset is the same as for the portfolio game part 1 (bundle1.zip).
  • The dataset consists of the historical prices of 20 assets.
  • The bundle contains 20 files (*.txt) linked to different assets.
  • The name of the file suggests the asset's name.
  • The structure of every file is as follows:
  1. The first line contains the name of the asset.
  2. The second line provides the number of data points N.
  3. The following N lines are data points with the structure: time, price.
  • The historical timeline for all assets is time $\in$ [0,100].
  • Future predictions should be calculated for time = 200.

Goal:

  • Load data, make predictions, and build the model.
  • Illustrate your predictions (can be done in the jupyter notebook)
  • Then, implement the WSM and ECM methods (see the tutorial on quadratic programming provided below).
  • Run your implementations for different calculation limits (e.g., the number of weight vectors for WSM). Compare the methods' efficiency in finding unique Pareto optimal solutions. Finally, illustrate generated Pareto fronts.

Short tutorial on the cvxopt library for quadratic programming¶

In [1]:
import numpy as np
from cvxopt import matrix, solvers

QP Optimization Problem¶

General model:¶

$max$ $\boldsymbol{cx} - \dfrac{1}{2}\boldsymbol{x}^T\boldsymbol{Qx}$
$s.t.$
$\boldsymbol{Gx} \leq \boldsymbol{h}$
$\boldsymbol{x} \geq \boldsymbol{0}$

But the library uses the following form:¶

$min$ $\boldsymbol{cx} + \dfrac{1}{2}\boldsymbol{x}^T\boldsymbol{Qx}$
$s.t.$
$\boldsymbol{Gx} \leq \boldsymbol{h}$
$\boldsymbol{Ax} = \boldsymbol{b}$

Exmple¶

$min$ $2x^2_1+x_2^2+x_1x_2+x_1+x_2$
$s.t.$
$x_1 \geq 0$
$x_2 \geq 0$
$x_1 + x_2 = 1$

Hence:¶

In [2]:
Q = matrix([ [4.0, 1.0], [1.0, 2.0] ]) ## [4, 1] is 1st column, not row!
In [3]:
c = matrix([1.0, 1.0]) ### (1, 2) = dimensions (1 row and 2 columns)
In [4]:
A = matrix([1.0, 1.0], (1,2)) ### (1, 2) = dimensions (1 row and 2 columns)
In [5]:
b = matrix(1.0) 
In [6]:
G = matrix([[-1.0,0.0],[0.0,-1.0]]) ### multiplied both sides by -1
In [7]:
h = matrix([0.0,0.0]) ### multiplied both sides by -1
In [8]:
solQP=solvers.qp(Q, c, G, h, A, b)
     pcost       dcost       gap    pres   dres
 0:  1.8889e+00  7.7778e-01  1e+00  3e-16  2e+00
 1:  1.8769e+00  1.8320e+00  4e-02  2e-16  6e-02
 2:  1.8750e+00  1.8739e+00  1e-03  1e-16  5e-04
 3:  1.8750e+00  1.8750e+00  1e-05  1e-16  5e-06
 4:  1.8750e+00  1.8750e+00  1e-07  3e-16  5e-08
Optimal solution found.
In [9]:
print(solQP.keys())
dict_keys(['x', 'y', 's', 'z', 'status', 'gap', 'relative gap', 'primal objective', 'dual objective', 'primal infeasibility', 'dual infeasibility', 'primal slack', 'dual slack', 'iterations'])
In [10]:
print(solQP['x'])
print(solQP['primal objective'])
[ 2.50e-01]
[ 7.50e-01]

1.8750000000000182

We can also solve LP problems:¶

$min$ $\boldsymbol{c}\boldsymbol{x}$
$s.t.$
$\boldsymbol{Gx} \leq \boldsymbol{h}$
$\boldsymbol{Ax} = \boldsymbol{b}$ (optional)

Exmple¶

$min$ $2x_1+x_2$
$s.t.$
$-x_1 +x_2 \leq 1$
$x_1 + x_2 \geq 2$
$x_2 \geq 0$
$x_1 - 2x_2 \leq 4$

In [11]:
G = matrix([ [-1.0, -1.0, 0.0, 1.0], [1.0, -1.0, -1.0, -2.0] ])
h = matrix([ 1.0, -2.0, 0.0, 4.0 ])
c = matrix([ 2.0, 1.0 ])
solLP = solvers.lp(c,G,h)  
###!!!! OPTIONALLY A and b can be provided (equality constraints) as in solQP=solvers.qp(Q, c, G, h, A, b)
     pcost       dcost       gap    pres   dres   k/t
 0:  2.6471e+00 -7.0588e-01  2e+01  8e-01  2e+00  1e+00
 1:  3.0726e+00  2.8437e+00  1e+00  1e-01  2e-01  3e-01
 2:  2.4891e+00  2.4808e+00  1e-01  1e-02  2e-02  5e-02
 3:  2.4999e+00  2.4998e+00  1e-03  1e-04  2e-04  5e-04
 4:  2.5000e+00  2.5000e+00  1e-05  1e-06  2e-06  5e-06
 5:  2.5000e+00  2.5000e+00  1e-07  1e-08  2e-08  5e-08
Optimal solution found.
In [12]:
print(solLP.keys())
dict_keys(['x', 'y', 's', 'z', 'status', 'gap', 'relative gap', 'primal objective', 'dual objective', 'primal infeasibility', 'dual infeasibility', 'primal slack', 'dual slack', 'residual as primal infeasibility certificate', 'residual as dual infeasibility certificate', 'iterations'])
In [13]:
print(solLP['x'])
print(solLP['primal objective'])
[ 5.00e-01]
[ 1.50e+00]

2.499999989554308

Portfolio optimization¶

Import the necessary libraries (numpy, pandas, matplotlib, cvxopt).

In [14]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from cvxopt import matrix, solvers
from sklearn.linear_model import Lasso
from scipy.signal import savgol_filter

# For reproducible randomness
np.random.seed(42)

Load the data from *Part1.txt files. Each file’s first line is the asset name, second line is N, and then N lines with “time price”.

In [15]:
def load_asset_data(data_folder="data"):
    asset_names = []
    asset_times = []
    asset_prices = []
    
    txt_files = [f for f in os.listdir(data_folder) if f.endswith("Part1.txt")]
    
    for fname in txt_files:
        path = os.path.join(data_folder, fname)
        with open(path, "r") as f:
            # 1) asset name
            asset_name = f.readline().strip()
    
            # 2) number of data points
            N_line = f.readline().strip()
            N = int(N_line)
    
            # 3) read time, price lines
            times = []
            prices = []
            for _ in range(N):
                line = f.readline().strip()
                t_str, p_str = line.split()
                times.append(float(t_str))
                prices.append(float(p_str))
    
            asset_names.append(asset_name)
            asset_times.append(times)
            asset_prices.append(prices)
    
    print(f"Found {len(asset_names)} assets.")
    print("First few asset names:", asset_names[:5])
    return asset_names, asset_times, asset_prices

asset_names, asset_times, asset_prices = load_asset_data()
Found 20 assets.
First few asset names: ['SafeAndCare', 'Moneymakers', 'Fuel4', 'CPU-XYZ', 'MarsProject']
In [16]:
def plot_asset_forecasts(asset_names, asset_times, asset_prices, forecast_results, training_offset=100, color="red"):
    """
    Plots forecast data for multiple assets on a 5x4 grid.
    
    Parameters:
      asset_names (list): List of asset names.
      asset_times (list of arrays/lists): Time points for each asset.
      asset_prices (list of arrays/lists): Historical prices for each asset.
      forecast_results (dict): Dictionary mapping asset names to forecast results for a specific method.
          Each entry for an asset should contain:
            - "price_at_training": forecast price at training time.
            - "price_at_forecast": forecast price at forecast time.
            - "full_reconstruction": (optional) tuple (recon_times, reconstruction) for the full reconstruction curve.
            - "forecast_time": (optional) forecast time (defaults to 200 if not provided).
      training_offset (int): The offset between forecast time and training time (default 100).
      color (str): Color for all forecast markers and lines.
    
    The function creates a grid of subplots (max 20 assets), where for each asset:
      - Historical data is plotted as black scatter points.
      - A training marker is plotted at (forecast_time - training_offset) using a circle marker.
      - A forecast marker is plotted at forecast_time using an 'x' marker.
      - If available, a reconstruction curve is plotted.
    """
    
    # Set up a 5x4 grid (max 20 assets)
    fig, axes = plt.subplots(5, 4, figsize=(20, 25))
    axes = axes.flatten()
    
    num_assets = min(len(asset_names), 20)
    
    for i in range(num_assets):
        ax = axes[i]
        asset = asset_names[i]
        times = np.array(asset_times[i])
        prices = np.array(asset_prices[i])
        
        # Plot historical data
        ax.scatter(times, prices, label="Historical Prices", color="black")
        
        # Retrieve forecast result for this asset
        forecast = forecast_results[asset]
        forecast_time = forecast.get("forecast_time", 200)
        training_time = forecast_time - training_offset
        
        # Plot training marker
        ax.scatter([training_time], [forecast["price_at_training"]],
                   marker="o", color=color, s=100,
                   label=f"Training @ t={training_time}")
        
        # Plot forecast marker
        ax.scatter([forecast_time], [forecast["price_at_forecast"]],
                   marker="x", color=color, s=100,
                   label=f"Forecast @ t={forecast_time}")
        
        # Plot reconstruction curve if available
        if "full_reconstruction" in forecast:
            recon_times, reconstruction = forecast["full_reconstruction"]
            ax.plot(recon_times, reconstruction, label="Reconstruction",
                    color=color, linewidth=2)
        
        ax.set_title(asset)
        ax.set_xlabel("Time")
        ax.set_ylabel("Price")
        ax.legend(fontsize='small')
    
    # Remove any unused subplots if there are fewer than 20 assets.
    for j in range(num_assets, len(axes)):
        fig.delaxes(axes[j])
    
    plt.tight_layout()
    plt.show()
In [17]:
all_results = {}

Perform a simple linear regression (degree=1) for each asset using times in [0,100]. Extrapolate to time=200 to get a predicted price. Convert that predicted price growth into a predicted return \mu[i].

In [18]:
import numpy as np
from sklearn.linear_model import Lasso
from scipy.signal import savgol_filter

def baseline_forecast(asset_names, asset_times, asset_prices, training_start=0, training_end=100, forecast_time=200):
    """
    
    Returns a dictionary with, for each asset:
      - "model_predictions": (training_times, model_predictions) on [training_start, training_end]
      - "full_reconstruction": (recon_times, linear_reconstruction) on [training_start, forecast_time]
      - "price_at_training": predicted price at training_end
      - "price_at_forecast": predicted price at forecast_time
      - "predicted_return": computed return
    """
    baseline_pred = {}
    for i, name in enumerate(asset_names):
        times = np.array(asset_times[i])
        prices = np.array(asset_prices[i])
        # Use training data: times between training_start and training_end
        mask = (times >= training_start) & (times <= training_end)
        t_used = times[mask]
        p_used = prices[mask]
        coeffs = np.polyfit(t_used, p_used, deg=1)
        # Model predictions on training interval
        model_predictions = np.polyval(coeffs, t_used)
        price_at_training = np.polyval(coeffs, training_end)
        price_at_forecast = np.polyval(coeffs, forecast_time)
        predicted_return = (price_at_forecast - price_at_training) / price_at_training
        # Generate a new time vector for full reconstruction:
        recon_times = np.linspace(training_start, forecast_time, forecast_time - training_start)
        linear_reconstruction = np.polyval(coeffs, recon_times)
        
        baseline_pred[name] = {
            "model_predictions": (t_used, model_predictions),
            "full_reconstruction": (recon_times, linear_reconstruction),
            "price_at_training": price_at_training,
            "price_at_forecast": price_at_forecast,
            "predicted_return": predicted_return
        }
    return baseline_pred
    
linear_results = baseline_forecast(asset_names, asset_times, asset_prices,
                                          training_start=0, training_end=100, forecast_time=200)
plot_asset_forecasts(asset_names, asset_times, asset_prices, linear_results, training_offset=100, color="red")
No description has been provided for this image
In [19]:
import numpy as np
from scipy.signal import savgol_filter
from sklearn.linear_model import Lasso

def build_candidate_library(x, candidate_frequencies):
    """
    Build a design matrix with a constant, linear, quadratic term,
    and for each candidate frequency, sine and cosine functions.
    """
    features = []
    feature_names = []
    features.append(np.ones_like(x))
    feature_names.append('1')
    features.append(x)
    feature_names.append('x')
    features.append(x**2)
    feature_names.append('x^2')
    for w in candidate_frequencies:
        features.append(np.sin(2 * np.pi * w * x))
        feature_names.append(f'sin(2π*{w:.2f}x)')
        features.append(np.cos(2 * np.pi * w * x))
        feature_names.append(f'cos(2π*{w:.2f}x)')
    X = np.column_stack(features)
    return X, feature_names

def sparse_forecast(asset_names, asset_times, asset_prices, training_start=0, training_end=100, forecast_time=200, alpha=0.01):
    """
    For each asset, perform:
      1. Denoise the full signal.
      2. From training data ([training_start, training_end]), perform FFT on the denoised data
         to select candidate frequencies.
      3. Compute the linear trend using raw training data.
      4. Subtract the trend from the denoised training data.
      5. Build a candidate library over the full time series.
      6. Fit a sparse (LASSO) model on the detrended training data.
      7. Reconstruct the full signal (by adding back the extrapolated trend).
      8. Evaluate the learned function at any new x using the candidate library.
    """
    sparse_pred = {}
    for i, name in enumerate(asset_names):
        times = np.array(asset_times[i])
        prices = np.array(asset_prices[i])
        
        # 1. Denoise the full signal.
        prices_denoised = savgol_filter(prices, window_length=11, polyorder=2)
        
        # 2. Restrict to training data and perform FFT on the denoised training data.
        mask = (times >= training_start) & (times <= training_end)
        t_train = times[mask]
        p_train_denoised = prices_denoised[mask]
        N = len(t_train)
        T = t_train[1] - t_train[0] if N > 1 else 1
        fft_vals = np.fft.fft(p_train_denoised)
        freq = np.fft.fftfreq(N, T)
        pos_mask = freq > 0
        freq_pos = freq[pos_mask]
        fft_magnitude = np.abs(fft_vals)[pos_mask]
        threshold = np.mean(fft_magnitude) + 1 * np.std(fft_magnitude)
        candidate_frequencies = freq_pos[fft_magnitude > threshold]
        candidate_frequencies = np.unique(np.round(candidate_frequencies, 2))
        
        # 3. Compute trend using raw training data (to preserve the true slope).
        p_train_raw = prices[mask]
        coeffs_trend = np.polyfit(t_train, p_train_raw, 1)
        trend_train = np.polyval(coeffs_trend, t_train)
        
        # 4. Subtract the trend from the denoised training data.
        p_train_detrended = p_train_denoised - trend_train
        
        # 5. Build candidate library over the full time series.
        X, _ = build_candidate_library(times, candidate_frequencies)
        X_train = X[mask, :]
        
        # 6. Fit sparse regression (LASSO) on detrended training data.
        lasso = Lasso(alpha=alpha, max_iter=10000)
        lasso.fit(X_train, p_train_detrended)
        coefs = lasso.coef_
        intercept_sparse = lasso.intercept_
        
        # 7. Reconstruct the full signal.
        p_detrended_reconstructed = intercept_sparse + X.dot(coefs)
        trend_full = np.polyval(coeffs_trend, times)
        p_reconstructed = p_detrended_reconstructed + trend_full
        # Clip negative values in the training reconstruction:
        p_reconstructed = np.maximum(p_reconstructed, 0)
        
        # 8. Define a prediction function that evaluates the learned function at any new x.
        def predict_new(x_new):
            X_new, _ = build_candidate_library(np.array([x_new]), candidate_frequencies)
            value = intercept_sparse + X_new.dot(coefs) + np.polyval(coeffs_trend, x_new)
            # Return non-negative prediction.
            return max(value.item(), 0)
        
        price_at_training = predict_new(training_end)
        price_at_forecast = predict_new(forecast_time)
        predicted_return = (price_at_forecast - price_at_training) / price_at_training
        
        # Generate a new reconstruction time vector from training_start to forecast_time.
        recon_times = np.linspace(training_start, forecast_time, forecast_time - training_start)
        X_new, _ = build_candidate_library(recon_times, candidate_frequencies)
        p_detrended_new = intercept_sparse + X_new.dot(coefs)
        p_reconstructed_new = p_detrended_new + np.polyval(coeffs_trend, recon_times)
        # Clip negative values.
        p_reconstructed_new = np.maximum(p_reconstructed_new, 0)
        
        sparse_pred[name] = {
            "model_predictions": (t_train, np.maximum(p_reconstructed[mask], 0)),  # for diagnostic on training data
            "full_reconstruction": (recon_times, p_reconstructed_new),
            "price_at_training": price_at_training,
            "price_at_forecast": price_at_forecast,
            "predicted_return": predicted_return
        }
    return sparse_pred

# Example usage:
sparse_results = sparse_forecast(asset_names, asset_times, asset_prices,
                                 training_start=0, training_end=100, forecast_time=200)
plot_asset_forecasts(asset_names, asset_times, asset_prices, sparse_results, training_offset=100, color="red")
No description has been provided for this image

Illustrate your predictions by plotting the historical data and the linear fit for a few assets, plus the forecasted price at t=200.

In [20]:
import numpy as np
from statsmodels.tsa.arima.model import ARIMA

def arima_forecast(asset_names, asset_times, asset_prices, training_start=0, training_end=100, forecast_time=200, order=(1,1,1), use_log=True):
    """
    For each asset, fit an ARIMA model on training data ([training_start, training_end])
    and forecast up to forecast_time.
    
    If use_log is True, a logarithmic transformation is applied to the training data,
    and forecasts are exponentiated to ensure all predictions are positive.
    
    Computes expected return as:
      (price_at_forecast - price_at_training) / price_at_training.
    
    Returns a dictionary with, for each asset:
      - "model_predictions": (training_times, in-sample predictions) on [training_start, training_end]
      - "full_reconstruction": (recon_times, arima_predictions) on [training_start, forecast_time]
      - "price_at_training": predicted price at training_end (last training prediction)
      - "price_at_forecast": predicted price at forecast_time (last forecast value)
      - "predicted_return": computed return
    """
    arima_pred = {}
    for i, name in enumerate(asset_names):
        times = np.array(asset_times[i])
        prices = np.array(asset_prices[i])
        
        # Select training data based on the provided time window.
        mask = (times >= training_start) & (times <= training_end)
        t_train = times[mask]
        p_train = prices[mask]
        
        if use_log:
            # Check that all training prices are positive.
            if np.any(p_train <= 0):
                raise ValueError(f"Asset {name} has non-positive prices; cannot use log transform.")
            p_train_trans = np.log(p_train)
        else:
            p_train_trans = p_train.copy()
        
        # Fit the ARIMA model on the (possibly transformed) training data.
        try:
            model = ARIMA(p_train_trans, order=order)
            model_fit = model.fit()
        except Exception as e:
            print(f"ARIMA fit failed for asset {name}: {e}")
            continue
        
        # In-sample prediction on the training data.
        pred_train_trans = model_fit.predict(start=0, end=len(p_train_trans)-1)
        # If using log transform, exponentiate predictions.
        if use_log:
            pred_train = np.exp(pred_train_trans)
        else:
            pred_train = pred_train_trans
        
        # Determine forecast steps.
        forecast_steps = int(forecast_time - training_end)
        if forecast_steps <= 0:
            raise ValueError("forecast_time must be greater than training_end")
        
        # Forecast future values.
        forecast_trans = model_fit.forecast(steps=forecast_steps)
        if use_log:
            forecast = np.exp(forecast_trans)
        else:
            forecast = forecast_trans
        
        # Construct full time series reconstruction.
        forecast_times = np.arange(training_end + 1, forecast_time + 1)
        full_times = np.concatenate([t_train, forecast_times])
        full_predictions = np.concatenate([pred_train, forecast])
        
        # Get the predicted price at training_end and forecast_time.
        price_at_training = pred_train[-1]
        price_at_forecast = forecast[-1]
        predicted_return = (price_at_forecast - price_at_training) / price_at_training
        
        # In case any residual negatives appear (shouldn't happen with log-transform):
        full_predictions = np.maximum(full_predictions, 0)
        
        arima_pred[name] = {
            "model_predictions": (t_train, pred_train),
            "full_reconstruction": (full_times, full_predictions),
            "price_at_training": price_at_training,
            "price_at_forecast": price_at_forecast,
            "predicted_return": predicted_return
        }
    return arima_pred

# Example usage:
# (Ensure asset_names, asset_times, asset_prices, and plot_asset_forecasts are defined.)
arima_results = arima_forecast(asset_names, asset_times, asset_prices,
                               training_start=0, training_end=100, forecast_time=200,
                               order=(2,1,2), use_log=True)
plot_asset_forecasts(asset_names, asset_times, asset_prices, arima_results, training_offset=100, color="red")
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/tsa/statespace/sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
  warn('Non-stationary starting autoregressive parameters'
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/tsa/statespace/sarimax.py:978: UserWarning: Non-invertible starting MA parameters found. Using zeros as starting parameters.
  warn('Non-invertible starting MA parameters found.'
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  warnings.warn("Maximum Likelihood optimization failed to "
No description has been provided for this image
In [21]:
import numpy as np
from statsmodels.tsa.holtwinters import ExponentialSmoothing

def exponential_smoothing_forecast(asset_names, asset_times, asset_prices,
                                   training_start=0, training_end=100, forecast_time=200,
                                   trend='add', seasonal=None, seasonal_periods=None,
                                   use_multiplicative=False):
    """
    Returns:
      A dictionary where each key is an asset name and each value is a dictionary containing:
        - "model_predictions": (training_times, in-sample predictions) on [training_start, training_end]
        - "full_reconstruction": (recon_times, exponential smoothing predictions) on [training_start, forecast_time]
        - "price_at_training": predicted price at training_end (last in-sample prediction)
        - "price_at_forecast": predicted price at forecast_time (last forecast value)
        - "predicted_return": computed return as (price_at_forecast - price_at_training) / price_at_training
    """
    exp_pred = {}
    for i, name in enumerate(asset_names):
        times = np.array(asset_times[i])
        prices = np.array(asset_prices[i])
        
        # Select training data based on the provided time window.
        mask = (times >= training_start) & (times <= training_end)
        t_train = times[mask]
        p_train = prices[mask]
        
        # Optionally, use multiplicative components if data are strictly positive.
        if use_multiplicative:
            if np.any(p_train <= 0):
                print(f"Warning: Asset {name} has non-positive values. Multiplicative model not appropriate. Using additive model.")
            else:
                trend = 'mul'
                if seasonal is not None:
                    seasonal = 'mul'
        
        # Fit the exponential smoothing model.
        try:
            model = ExponentialSmoothing(p_train, trend=trend, seasonal=seasonal, seasonal_periods=seasonal_periods)
            model_fit = model.fit()
        except Exception as e:
            print(f"Exponential Smoothing model fit failed for asset {name}: {e}")
            continue
        
        # In-sample fitted values (for training period).
        pred_train = model_fit.fittedvalues
        # Clip negative values.
        pred_train = np.maximum(pred_train, 0)
        
        # Determine number of forecast steps.
        forecast_steps = int(forecast_time - training_end)
        if forecast_steps <= 0:
            raise ValueError("forecast_time must be greater than training_end")
        
        # Forecast future values.
        forecast_values = model_fit.forecast(steps=forecast_steps)
        forecast_values = np.maximum(forecast_values, 0)
        
        # Generate a time vector for the forecast period.
        forecast_times = np.arange(training_end + 1, forecast_time + 1)
        # Build the full reconstruction time series.
        full_times = np.concatenate([t_train, forecast_times])
        full_predictions = np.concatenate([pred_train, forecast_values])
        full_predictions = np.maximum(full_predictions, 0)  # ensure non-negative
        
        # Get the predicted prices at training_end and forecast_time.
        price_at_training = pred_train[-1]
        price_at_forecast = forecast_values[-1]
        predicted_return = (price_at_forecast - price_at_training) / price_at_training
        
        exp_pred[name] = {
            "model_predictions": (t_train, pred_train),
            "full_reconstruction": (full_times, full_predictions),
            "price_at_training": price_at_training,
            "price_at_forecast": price_at_forecast,
            "predicted_return": predicted_return
        }
    return exp_pred

exp_results = exponential_smoothing_forecast(asset_names, asset_times, asset_prices,
                                             training_start=0, training_end=100, forecast_time=200,
                                             trend='add', seasonal=None, seasonal_periods=None,
                                             use_multiplicative=True)
plot_asset_forecasts(asset_names, asset_times, asset_prices, exp_results, training_offset=100, color="red")
No description has been provided for this image
In [22]:
import numpy as np
import pandas as pd
from sktime.forecasting.naive import NaiveForecaster

def naive_forecast(asset_names, asset_times, asset_prices,
                   training_start=0, training_end=100, forecast_time=200, strategy="last"):
    """
    Returns:
      A dictionary where for each asset the following keys are returned:
        - "model_predictions": (training_times, in-sample naive predictions) on [training_start, training_end]
        - "full_reconstruction": (recon_times, naive forecast predictions) on [training_start, forecast_time]
        - "price_at_training": reference price at training_end (from actual training data)
        - "price_at_forecast": forecasted price at forecast_time
        - "predicted_return": computed return as (price_at_forecast - price_at_training) / price_at_training
    """
    naive_pred = {}
    for i, name in enumerate(asset_names):
        times = np.array(asset_times[i])
        prices = np.array(asset_prices[i])
        
        # Select training data: times between training_start and training_end.
        mask = (times >= training_start) & (times <= training_end)
        t_train = times[mask]
        p_train = prices[mask]
        
        # Compute in-sample naive predictions.
        if strategy == "last":
            in_sample_predictions = np.empty_like(p_train)
            in_sample_predictions[0] = p_train[0]
            if len(p_train) > 1:
                in_sample_predictions[1:] = p_train[:-1]
        elif strategy == "drift":
            if len(p_train) > 1:
                drift = (p_train[-1] - p_train[0]) / (len(p_train) - 1)
            else:
                drift = 0
            in_sample_predictions = np.empty_like(p_train)
            in_sample_predictions[0] = p_train[0]
            for j in range(1, len(p_train)):
                in_sample_predictions[j] = p_train[j-1] + drift
        elif strategy == "mean":
            mean_value = np.mean(p_train)
            in_sample_predictions = np.full_like(p_train, fill_value=mean_value)
        else:
            raise ValueError("Unsupported strategy. Use 'last', 'drift', or 'mean'.")
        
        # Clip negative in-sample predictions.
        in_sample_predictions = np.maximum(in_sample_predictions, 0)
        
        # Determine the number of forecast steps.
        forecast_steps = int(forecast_time - training_end)
        if forecast_steps <= 0:
            raise ValueError("forecast_time must be greater than training_end")
        
        # Prepare the training series for sktime.
        # Assuming training times are consecutive integers.
        y_train = pd.Series(p_train, index=pd.RangeIndex(start=int(t_train[0]),
                                                         stop=int(t_train[0]) + len(t_train)))
        
        # Fit the NaiveForecaster from sktime.
        forecaster = NaiveForecaster(strategy=strategy)
        forecaster.fit(y_train)
        fh = np.arange(1, forecast_steps + 1)  # forecasting horizon (steps ahead)
        y_forecast = forecaster.predict(fh)
        
        # Clip forecast predictions to ensure non-negativity.
        forecast_values = np.maximum(y_forecast.values, 0)
        
        # Create forecast time points (assuming integer time steps).
        forecast_times = np.arange(training_end + 1, forecast_time + 1)
        
        # Full reconstruction: combine in-sample predictions and forecast values.
        full_times = np.concatenate([t_train, forecast_times])
        full_predictions = np.concatenate([in_sample_predictions, forecast_values])
        
        # Determine prices at training_end and forecast_time.
        price_at_training = p_train[-1]  # using actual last training price as reference
        price_at_forecast = forecast_values[-1]
        predicted_return = (price_at_forecast - price_at_training) / price_at_training
        
        naive_pred[name] = {
            "model_predictions": (t_train, in_sample_predictions),
            "full_reconstruction": (full_times, full_predictions),
            "price_at_training": price_at_training,
            "price_at_forecast": price_at_forecast,
            "predicted_return": predicted_return
        }
    return naive_pred

# Example usage:
naive_results = naive_forecast(asset_names, asset_times, asset_prices,
                               training_start=0, training_end=100, forecast_time=200, strategy="last")
plot_asset_forecasts(asset_names, asset_times, asset_prices, naive_results, training_offset=100, color="green")
No description has been provided for this image
In [23]:
import numpy as np
from sklearn.tree import DecisionTreeRegressor

def decision_tree_forecast(asset_names, asset_times, asset_prices,
                           training_start=0, training_end=100, forecast_time=200,
                           seasonal_period=None, max_depth=None):
    """
    For each asset, forecast prices using a Decision Tree Regressor with conditional deseasonalization and detrending.
    
    Steps for each asset:
      1. Extract training data from training_start to training_end.
      2. Fit a linear trend (using np.polyfit) to the training data.
      3. Detrend the training data by subtracting the trend.
      4. If seasonal_period is provided and > 1, compute a seasonal component (average residual per seasonal position)
         and remove it (deseasonalize the data).
      5. Fit a Decision Tree Regressor on the deseasonalized residuals.
      6. For forecasting, predict the residuals on new time points, add back the trend (evaluated using the linear model)
         and the seasonal component (if available) to obtain the final forecast.
    """
    dt_pred = {}
    for i, name in enumerate(asset_names):
        times = np.array(asset_times[i])
        prices = np.array(asset_prices[i])
        
        # 1. Select training data
        mask = (times >= training_start) & (times <= training_end)
        t_train = times[mask]
        p_train = prices[mask]
        
        # 2. Fit a linear trend for detrending (using a first-degree polynomial)
        coeffs = np.polyfit(t_train, p_train, deg=1)
        trend_train = np.polyval(coeffs, t_train)
        
        # 3. Detrend the training data
        p_detrended = p_train - trend_train
        
        # 4. Conditional deseasonalization if a seasonal period is provided
        if seasonal_period is not None and seasonal_period > 1:
            # Compute the average seasonal effect for each season position
            seasonal_effect = np.zeros(seasonal_period)
            count = np.zeros(seasonal_period)
            for idx, t in enumerate(t_train):
                pos = int((t - training_start) % seasonal_period)
                seasonal_effect[pos] += p_detrended[idx]
                count[pos] += 1
            # Avoid division by zero and compute average
            seasonal_effect = np.where(count > 0, seasonal_effect / count, 0)
            # Build the seasonal component for each training time
            seasonal_train = np.array([seasonal_effect[int((t - training_start) % seasonal_period)]
                                       for t in t_train])
            # Remove seasonal effect from the detrended data
            p_deseasonalized = p_detrended - seasonal_train
        else:
            p_deseasonalized = p_detrended
            seasonal_train = np.zeros_like(p_train)  # no seasonal component
        
        # 5. Fit Decision Tree Regressor on the deseasonalized (residual) training data.
        dt_reg = DecisionTreeRegressor(max_depth=max_depth)
        dt_reg.fit(t_train.reshape(-1, 1), p_deseasonalized)
        
        # In-sample prediction on training data (residual prediction)
        pred_deseasonalized_train = dt_reg.predict(t_train.reshape(-1, 1))
        # Reconstruct training predictions by adding back the trend and seasonal component (if any)
        pred_train_reconstructed = pred_deseasonalized_train + trend_train
        if seasonal_period is not None and seasonal_period > 1:
            pred_train_reconstructed += seasonal_train
        
        # 6. Forecasting for times after training_end
        forecast_times = np.arange(training_end + 1, forecast_time + 1)
        # Predict the residuals for forecast times
        pred_deseasonalized_forecast = dt_reg.predict(forecast_times.reshape(-1, 1))
        # Evaluate trend for forecast times
        trend_forecast = np.polyval(coeffs, forecast_times)
        # If seasonal, compute seasonal component for forecast times
        if seasonal_period is not None and seasonal_period > 1:
            seasonal_forecast = np.array([seasonal_effect[int((t - training_start) % seasonal_period)]
                                          for t in forecast_times])
        else:
            seasonal_forecast = np.zeros_like(forecast_times)
        # Reconstruct forecasted predictions
        forecast_pred = pred_deseasonalized_forecast + trend_forecast + seasonal_forecast
        
        # 7. Build full reconstruction of predictions (training + forecast)
        full_times = np.concatenate([t_train, forecast_times])
        full_predictions = np.concatenate([pred_train_reconstructed, forecast_pred])
        
        # Determine price at training_end (last training prediction) and at forecast_time (last forecast value)
        price_at_training = pred_train_reconstructed[-1]
        price_at_forecast = forecast_pred[-1]
        predicted_return = (price_at_forecast - price_at_training) / price_at_training
        
        dt_pred[name] = {
            "model_predictions": (t_train, pred_train_reconstructed),
            "full_reconstruction": (full_times, full_predictions),
            "price_at_training": price_at_training,
            "price_at_forecast": price_at_forecast,
            "predicted_return": predicted_return
        }
        
    return dt_pred

# Example usage:
dt_results = decision_tree_forecast(asset_names, asset_times, asset_prices,
                                    training_start=0, training_end=100, forecast_time=200,
                                    seasonal_period=50, max_depth=50)
plot_asset_forecasts(asset_names, asset_times, asset_prices, dt_results, training_offset=100, color="orange")
No description has been provided for this image
In [24]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, SimpleRNN

def create_dataset(data, look_back):
    """
    Create sliding-window sequences for training.
    
    Args:
      data: 1D numpy array of normalized asset prices.
      look_back: number of time steps to use as input.
      
    Returns:
      X: numpy array of shape (n_samples, look_back)
      y: numpy array of shape (n_samples,)
    """
    X, y = [], []
    for i in range(len(data) - look_back):
        X.append(data[i:i+look_back])
        y.append(data[i+look_back])
    return np.array(X), np.array(y)

def rnn_forecast(asset_names, asset_times, asset_prices,
                 training_start=0, training_end=100, forecast_time=200,
                 look_back=10, epochs=20, batch_size=64):
    """
    For each asset, this function:
      1. Selects training data in [training_start, training_end].
      2. Computes the min and max of the training data and normalizes it.
      3. Creates sliding-window sequences and trains an RNN model on the normalized data.
      4. Makes in-sample predictions and recursively forecasts until forecast_time (all in normalized space).
      5. Transforms the predictions back to the original scale using the stored min and max.
      6. Clipping: Any negative values in the predictions (both in-sample and forecast) are set to 0.
    """
    rnn_pred = {}
    
    for i, name in enumerate(asset_names):
        times = np.array(asset_times[i])
        prices = np.array(asset_prices[i])
        
        # Select training data using the provided time window.
        mask = (times >= training_start) & (times <= training_end)
        t_train = times[mask]
        p_train = prices[mask]
        
        if len(p_train) <= look_back:
            raise ValueError(f"Not enough training data for asset {name} with look_back = {look_back}.")
        
        # Get the min and max from the training data (original scale).
        train_min = p_train.min()
        train_max = p_train.max()
        
        # Normalize the training data.
        p_train_norm = (p_train - train_min) / (train_max - train_min)
        
        # Create sliding-window sequences on the normalized data.
        X_train, y_train = create_dataset(p_train_norm, look_back)
        X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
        
        # Build the RNN model using the specified architecture.
        model = Sequential()
        model.add(SimpleRNN(units=50, return_sequences=True, input_shape=(X_train.shape[1], 1)))
        model.add(SimpleRNN(units=50, return_sequences=False))
        model.add(Dense(units=1))
        model.compile(optimizer='adam', loss='mean_squared_error')
        
        # Train the RNN model on normalized data (verbose=0 to suppress output)
        model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=0)
        
        # In-sample predictions on the normalized training data.
        pred_train_norm = model.predict(X_train, verbose=0).flatten()
        # Inverse transform the normalized predictions to the original scale.
        pred_train = pred_train_norm * (train_max - train_min) + train_min
        # Clip any negative predictions to 0.
        pred_train = np.maximum(pred_train, 0)
        # The in-sample predictions correspond to times t_train[look_back:].
        t_train_pred = t_train[look_back:]
        
        # Determine forecast steps (assumes integer time steps).
        forecast_steps = int(forecast_time - training_end)
        if forecast_steps <= 0:
            raise ValueError("forecast_time must be greater than training_end")
        
        # Recursive forecasting: use the last look_back normalized values to predict the next value.
        last_sequence = p_train_norm[-look_back:].tolist()
        forecasted_norm = []
        for _ in range(forecast_steps):
            input_seq = np.array(last_sequence[-look_back:]).reshape((1, look_back, 1))
            next_val_norm = model.predict(input_seq, verbose=0)[0, 0]
            forecasted_norm.append(next_val_norm)
            last_sequence.append(next_val_norm)
        
        # Inverse-transform the forecasted normalized predictions.
        forecasted = np.array(forecasted_norm) * (train_max - train_min) + train_min
        # Clip negative forecast values.
        forecasted = np.maximum(forecasted, 0)
        
        # Construct full reconstruction: combine in-sample predictions and forecasted values.
        forecast_times = np.arange(training_end + 1, forecast_time + 1)
        full_times = np.concatenate([t_train_pred, forecast_times])
        full_predictions = np.concatenate([pred_train, forecasted])
        full_predictions = np.maximum(full_predictions, 0)
        
        price_at_training = pred_train[-1]
        price_at_forecast = forecasted[-1]
        predicted_return = (price_at_forecast - price_at_training) / price_at_training
        
        rnn_pred[name] = {
            "model_predictions": (t_train_pred, pred_train),
            "full_reconstruction": (full_times, full_predictions),
            "price_at_training": price_at_training,
            "price_at_forecast": price_at_forecast,
            "predicted_return": predicted_return
        }
    
    return rnn_pred

# (Assuming asset_names, asset_times, asset_prices, and plot_asset_forecasts are defined)
rnn_results = rnn_forecast(asset_names, asset_times, asset_prices,
                           training_start=0, training_end=100, forecast_time=200,
                           look_back=30, epochs=20, batch_size=64)
plot_asset_forecasts(asset_names, asset_times, asset_prices, rnn_results, training_offset=100, color="brown")
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/keras/src/layers/rnn/rnn.py:200: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
No description has been provided for this image
In [25]:
import numpy as np
import matplotlib.pyplot as plt

def aggregate_and_plot_mean_median(asset_names, asset_times, asset_prices,
                                   rnn_results, dt_results, naive_results, exp_results,
                                   arima_results, sparse_results, linear_results,
                                   training_start=0, training_end=100, forecast_time=200):
    """
    For each asset, this function:
      - Prints a table of predicted training price, forecast price, and expected return (in %)
        for each forecasting method.
      - Computes the pointwise median and mean forecasts across methods.
      - Displays two separate 5×4 grids: one for the median forecast and one for the mean forecast.
    
    Expected return is computed as (forecast / training_price)*100%.
    
    Returns a dictionary with two keys: "median" and "mean". Each value is a dictionary where each asset
    is represented with keys:
      - "model_predictions": (training_times, aggregated in-sample predictions) – here we fill the training period with the aggregated training price.
      - "full_reconstruction": (common_times, aggregated full forecast)
      - "price_at_training": aggregated training price (scalar)
      - "price_at_forecast": aggregated forecast price (scalar)
      - "predicted_return": aggregated expected return (scalar)
    """
    # Combine forecast dictionaries.
    methods = {
        'RNN': rnn_results,
        'DecisionTree': dt_results,
        'Naive': naive_results,
        'ExpSmoothing': exp_results,
        'ARIMA': arima_results,
        'Sparse': sparse_results,
        'Linear': linear_results
    }
    
    # Create a common time axis.
    common_times = np.linspace(training_start, forecast_time, forecast_time - training_start)
    
    # Containers for aggregated results.
    median_results = {}
    mean_results = {}
    
    # Loop over assets.
    for idx, asset in enumerate(asset_names):
        print(f"\n{'='*60}\nAsset: {asset}")
        
        # Get asset training data.
        times = np.array(asset_times[idx])
        prices = np.array(asset_prices[idx])
        mask = (times >= training_start) & (times <= training_end)
        t_train = times[mask]
        real_training_price = prices[mask][-1]
        print(f"Real training price (last value in training data): {real_training_price:.4f}")
        
        # Initialize lists to collect scalar predictions and aligned full forecasts.
        training_preds = []    # from each method: price_at_training
        forecast_preds = []    # from each method: price_at_forecast
        exp_returns = []       # expected return in percentage
        full_predictions_aligned = []  # each element is an array aligned on common_times
        
        # Print table header.
        header = f"{'Method':<15} {'TrainPred':>10} {'Forecast':>10} {'Return (%)':>12}"
        print(header)
        print("-" * len(header))
        
        for method_name, result in methods.items():
            if asset not in result:
                continue
            res = result[asset]
            train_pred = res["price_at_training"]
            forecast_pred = res["price_at_forecast"]
            # Expected return = (forecast / train_pred)*100%
            exp_return = (forecast_pred / train_pred) * 100.0
            print(f"{method_name:<15} {train_pred:10.4f} {forecast_pred:10.4f} {exp_return:12.2f}")
            training_preds.append(train_pred)
            forecast_preds.append(forecast_pred)
            exp_returns.append(exp_return)
            # Interpolate full reconstruction on common_times.
            method_times, method_preds = res["full_reconstruction"]
            aligned = np.interp(common_times, method_times, method_preds)
            full_predictions_aligned.append(aligned)
        
        # Compute aggregated scalar values.
        median_train = np.median(training_preds)
        median_forecast = np.median(forecast_preds)
        median_return = np.median(exp_returns)
        mean_train = np.mean(training_preds)
        mean_forecast = np.mean(forecast_preds)
        mean_return = np.mean(exp_returns)
        print("-" * len(header))
        print(f"{'Median':<15} {median_train:10.4f} {median_forecast:10.4f} {median_return:12.2f}")
        print(f"{'Mean':<15} {mean_train:10.4f} {mean_forecast:10.4f} {mean_return:12.2f}")
        
        # Compute pointwise aggregated forecasts.
        full_predictions_aligned = np.array(full_predictions_aligned)
        median_full = np.median(full_predictions_aligned, axis=0)
        mean_full = np.mean(full_predictions_aligned, axis=0)
        
        # Save aggregated results for this asset.
        # For in-sample predictions, we simply fill the training period with the aggregated training price.
        median_results[asset] = {
            "model_predictions": (t_train, np.full_like(t_train, median_train)),
            "full_reconstruction": (common_times, median_full),
            "price_at_training": median_train,
            "price_at_forecast": median_forecast,
            "predicted_return": median_return
        }
        mean_results[asset] = {
            "model_predictions": (t_train, np.full_like(t_train, mean_train)),
            "full_reconstruction": (common_times, mean_full),
            "price_at_training": mean_train,
            "price_at_forecast": mean_forecast,
            "predicted_return": mean_return
        }
    
    # Plotting: Two separate 5x4 charts.
    n_assets = len(asset_names)
    n_rows, n_cols = 5, 4
    
    # Figure for Median forecasts.
    fig_med, axes_med = plt.subplots(n_rows, n_cols, figsize=(20, 25), sharex=True, sharey=False)
    axes_med = axes_med.flatten()
    for idx, asset in enumerate(asset_names):
        ax = axes_med[idx]
        # Plot aggregated median forecast.
        ax.plot(common_times, median_results[asset]["full_reconstruction"][1], label="Median Forecast", color='black', linewidth=2)
        # Plot each method's forecast.
        for method_name, result in methods.items():
            if asset in result:
                m_times, m_preds = result[asset]["full_reconstruction"]
                aligned = np.interp(common_times, m_times, m_preds)
                ax.plot(common_times, aligned, label=method_name, linestyle="--", alpha=0.6)
        # Plot real training data.
        mask = (np.array(asset_times[asset_names.index(asset)]) >= training_start) & (np.array(asset_times[asset_names.index(asset)]) <= training_end)
        t_train = np.array(asset_times[asset_names.index(asset)])[mask]
        ax.plot(t_train, np.array(asset_prices[asset_names.index(asset)])[mask], label="Real Training", color="blue", linewidth=2)
        ax.set_title(asset)
        ax.set_xlabel("Time")
        ax.set_ylabel("Price")
        ax.legend(fontsize=8)
    # Hide any unused subplots.
    for j in range(idx+1, n_rows*n_cols):
        axes_med[j].axis('off')
    fig_med.tight_layout()
    
    # Figure for Mean forecasts.
    fig_mean, axes_mean = plt.subplots(n_rows, n_cols, figsize=(20, 25), sharex=True, sharey=False)
    axes_mean = axes_mean.flatten()
    for idx, asset in enumerate(asset_names):
        ax = axes_mean[idx]
        # Plot aggregated mean forecast.
        ax.plot(common_times, mean_results[asset]["full_reconstruction"][1], label="Mean Forecast", color='black', linewidth=2)
        # Plot each method's forecast.
        for method_name, result in methods.items():
            if asset in result:
                m_times, m_preds = result[asset]["full_reconstruction"]
                aligned = np.interp(common_times, m_times, m_preds)
                ax.plot(common_times, aligned, label=method_name, linestyle="--", alpha=0.6)
        # Plot real training data.
        mask = (np.array(asset_times[asset_names.index(asset)]) >= training_start) & (np.array(asset_times[asset_names.index(asset)]) <= training_end)
        t_train = np.array(asset_times[asset_names.index(asset)])[mask]
        ax.plot(t_train, np.array(asset_prices[asset_names.index(asset)])[mask], label="Real Training", color="blue", linewidth=2)
        ax.set_title(asset)
        ax.set_xlabel("Time")
        ax.set_ylabel("Price")
        ax.legend(fontsize=8)
    for j in range(idx+1, n_rows*n_cols):
        axes_mean[j].axis('off')
    fig_mean.tight_layout()
    
    plt.show()
    
    # Return the aggregated results.
    return {"median": median_results, "mean": mean_results}

results = aggregate_and_plot_mean_median(asset_names, asset_times, asset_prices,
                                         rnn_results, dt_results, naive_results, exp_results,
                                         arima_results, sparse_results, linear_results,
                                         training_start=0, training_end=100, forecast_time=200)
============================================================
Asset: SafeAndCare
Real training price (last value in training data): 6.3601
Method           TrainPred   Forecast   Return (%)
--------------------------------------------------
RNN                 6.3818     6.6640       104.42
DecisionTree        6.3601     5.4799        86.16
Naive               6.3601     6.3601       100.00
ExpSmoothing        6.3384     5.0240        79.26
ARIMA               6.3513     6.0570        95.37
Sparse              6.2786     7.8440       124.93
Linear              6.4551     5.5750        86.36
--------------------------------------------------
Median              6.3601     6.0570        95.37
Mean                6.3608     6.1434        96.64

============================================================
Asset: Moneymakers
Real training price (last value in training data): 4.1005
Method           TrainPred   Forecast   Return (%)
--------------------------------------------------
RNN                 4.4788     4.5579       101.76
DecisionTree        4.1005     6.0619       147.83
Naive               4.1005     4.1005       100.00
ExpSmoothing        4.0759     0.4886        11.99
ARIMA               4.1003     2.9480        71.90
Sparse              4.2734    10.0247       234.58
Linear              4.5796     6.5410       142.83
--------------------------------------------------
Median              4.1005     4.5579       101.76
Mean                4.2442     4.9604       115.84

============================================================
Asset: Fuel4
Real training price (last value in training data): 6.0939
Method           TrainPred   Forecast   Return (%)
--------------------------------------------------
RNN                 6.1473     7.5696       123.14
DecisionTree        6.0939     3.9304        64.50
Naive               6.0939     6.0939       100.00
ExpSmoothing        6.1702     4.5776        74.19
ARIMA               6.1637     5.4801        88.91
Sparse              6.1140     3.0142        49.30
Linear              5.6435     3.4800        61.66
--------------------------------------------------
Median              6.1140     4.5776        74.19
Mean                6.0609     4.8780        80.24

============================================================
Asset: CPU-XYZ
Real training price (last value in training data): 8.1245
Method           TrainPred   Forecast   Return (%)
--------------------------------------------------
RNN                 8.3556    11.8942       142.35
DecisionTree        8.1245     7.9867        98.30
Naive               8.1245     8.1245       100.00
ExpSmoothing        8.3347    10.8390       130.05
ARIMA               8.3324     8.4699       101.65
Sparse              8.2151    13.5739       165.23
Linear              6.7088     6.5710        97.95
--------------------------------------------------
Median              8.2151     8.4699       101.65
Mean                8.0279     9.6370       119.36

============================================================
Asset: MarsProject
Real training price (last value in training data): 3.4077
Method           TrainPred   Forecast   Return (%)
--------------------------------------------------
RNN                 3.4145     2.8495        83.45
DecisionTree        3.4077     3.7016       108.63
Naive               3.4077     3.4077       100.00
ExpSmoothing        3.6200     2.7465        75.87
ARIMA               3.6997     3.7121       100.34
Sparse              3.6921     7.7155       208.97
Linear              3.6912     3.9852       107.96
--------------------------------------------------
Median              3.6200     3.7016       100.34
Mean                3.5618     4.0169       112.17

============================================================
Asset: WorldNow
Real training price (last value in training data): 8.1922
Method           TrainPred   Forecast   Return (%)
--------------------------------------------------
RNN                 8.2146     8.1388        99.08
DecisionTree        8.1922    11.5655       141.18
Naive               8.1922     8.1922       100.00
ExpSmoothing        8.2196     8.2684       100.59
ARIMA               8.2129     8.1328        99.02
Sparse              8.3721    17.1030       204.29
Linear              7.9307    11.3040       142.53
--------------------------------------------------
Median              8.2129     8.2684       100.59
Mean                8.1906    10.3864       126.67

============================================================
Asset: Photons
Real training price (last value in training data): 6.3065
Method           TrainPred   Forecast   Return (%)
--------------------------------------------------
RNN                 6.0941     1.9155        31.43
DecisionTree        6.3065     8.1080       128.56
Naive               6.3065     6.3065       100.00
ExpSmoothing        6.3002    20.4726       324.95
ARIMA               6.2983    12.9393       205.44
Sparse              6.0467     0.6724        11.12
Linear              6.7308     8.5322       126.76
--------------------------------------------------
Median              6.3002     8.1080       126.76
Mean                6.2976     8.4209       132.61

============================================================
Asset: EnviroLike
Real training price (last value in training data): 6.3693
Method           TrainPred   Forecast   Return (%)
--------------------------------------------------
RNN                 6.5224     5.9498        91.22
DecisionTree        6.3693     5.1785        81.30
Naive               6.3693     6.3693       100.00
ExpSmoothing        6.6534     3.6944        55.53
ARIMA               6.6833     6.5567        98.11
Sparse              6.7181     9.7550       145.20
Linear              6.3914     5.2006        81.37
--------------------------------------------------
Median              6.5224     5.9498        91.22
Mean                6.5296     6.1006        93.25

============================================================
Asset: BetterTechnology
Real training price (last value in training data): 5.4368
Method           TrainPred   Forecast   Return (%)
--------------------------------------------------
RNN                 5.1511     4.0262        78.16
DecisionTree        5.4368     4.5465        83.63
Naive               5.4368     5.4368       100.00
ExpSmoothing        5.1834     1.5362        29.64
ARIMA               5.2241     4.4289        84.78
Sparse              5.4819     1.7524        31.97
Linear              5.7359     4.8456        84.48
--------------------------------------------------
Median              5.4368     4.4289        83.63
Mean                5.3786     3.7961        70.38

============================================================
Asset: PearPear
Real training price (last value in training data): 2.5805
Method           TrainPred   Forecast   Return (%)
--------------------------------------------------
RNN                 2.5912     0.2367         9.14
DecisionTree        2.5805     3.7159       144.00
Naive               2.5805     2.5805       100.00
ExpSmoothing        2.5799     0.3529        13.68
ARIMA               2.6076     0.3666        14.06
Sparse              2.5296     0.0000         0.00
Linear              4.2897     5.4252       126.47
--------------------------------------------------
Median              2.5805     0.3666        14.06
Mean                2.8227     1.8111        58.19

============================================================
Asset: PositiveCorrelation
Real training price (last value in training data): 3.4522
Method           TrainPred   Forecast   Return (%)
--------------------------------------------------
RNN                 3.1966     4.0220       125.82
DecisionTree        3.4522     0.3755        10.88
Naive               3.4522     3.4522       100.00
ExpSmoothing        3.4567    16.4869       476.95
ARIMA               3.4498     4.4155       127.99
Sparse              3.1575     0.0000         0.00
Linear              3.3221     0.2455         7.39
--------------------------------------------------
Median              3.4498     3.4522       100.00
Mean                3.3553     4.1425       121.29

============================================================
Asset: WaterForce
Real training price (last value in training data): 2.8332
Method           TrainPred   Forecast   Return (%)
--------------------------------------------------
RNN                 3.0084     3.5301       117.34
DecisionTree        2.8332     4.3917       155.01
Naive               2.8332     2.8332       100.00
ExpSmoothing        2.8341     0.1194         4.21
ARIMA               2.8314     1.9700        69.58
Sparse              3.0083     0.0000         0.00
Linear              3.6786     5.2370       142.37
--------------------------------------------------
Median              2.8341     2.8332       100.00
Mean                3.0039     2.5831        84.07

============================================================
Asset: Electronics123
Real training price (last value in training data): 7.0791
Method           TrainPred   Forecast   Return (%)
--------------------------------------------------
RNN                 7.1507     6.8433        95.70
DecisionTree        7.0791     5.9075        83.45
Naive               7.0791     7.0791       100.00
ExpSmoothing        7.2113     3.8513        53.41
ARIMA               7.2006     7.1242        98.94
Sparse              7.3697     6.4980        88.17
Linear              6.9725     5.8009        83.20
--------------------------------------------------
Median              7.1507     6.4980        88.17
Mean                7.1519     6.1577        86.12

============================================================
Asset: RoboticsX
Real training price (last value in training data): 4.4676
Method           TrainPred   Forecast   Return (%)
--------------------------------------------------
RNN                 4.4716     5.2147       116.62
DecisionTree        4.4676     6.4657       144.72
Naive               4.4676     4.4676       100.00
ExpSmoothing        4.4328     1.4151        31.92
ARIMA               4.5043     5.4782       121.62
Sparse              4.4876     0.0000         0.00
Linear              6.0182     8.0163       133.20
--------------------------------------------------
Median              4.4716     5.2147       116.62
Mean                4.6928     4.4368        92.58

============================================================
Asset: Apples
Real training price (last value in training data): 4.0061
Method           TrainPred   Forecast   Return (%)
--------------------------------------------------
RNN                 3.9539     4.3519       110.07
DecisionTree        4.0061    -0.3381        -8.44
Naive               4.0061     4.0061       100.00
ExpSmoothing        4.1689    11.1636       267.78
ARIMA               4.2264     4.3909       103.89
Sparse              3.7707     0.0000         0.00
Linear              5.1556     0.8114        15.74
--------------------------------------------------
Median              4.0061     4.0061       100.00
Mean                4.1840     3.4837        84.15

============================================================
Asset: BetterTomorrow
Real training price (last value in training data): 5.5067
Method           TrainPred   Forecast   Return (%)
--------------------------------------------------
RNN                 5.5855     5.4887        98.27
DecisionTree        5.5067     6.6108       120.05
Naive               5.5067     5.5067       100.00
ExpSmoothing        5.7371     5.4312        94.67
ARIMA               5.6879     5.4010        94.95
Sparse              5.8045    10.6659       183.75
Linear              5.6146     6.7186       119.66
--------------------------------------------------
Median              5.6146     5.5067       100.00
Mean                5.6347     6.5461       115.91

============================================================
Asset: SpaceNow
Real training price (last value in training data): 5.6439
Method           TrainPred   Forecast   Return (%)
--------------------------------------------------
RNN                 5.7359     5.5234        96.30
DecisionTree        5.6439     5.8384       103.45
Naive               5.6439     5.6439       100.00
ExpSmoothing        5.7533     3.4518        60.00
ARIMA               5.7685     5.6312        97.62
Sparse              5.6534     4.4997        79.59
Linear              5.6289     5.8234       103.46
--------------------------------------------------
Median              5.6534     5.6312        97.62
Mean                5.6897     5.2017        91.49

============================================================
Asset: Lasers
Real training price (last value in training data): 6.1223
Method           TrainPred   Forecast   Return (%)
--------------------------------------------------
RNN                 5.7381     7.3575       128.22
DecisionTree        6.1223     3.9896        65.17
Naive               6.1223     6.1223       100.00
ExpSmoothing        5.9820     3.7861        63.29
ARIMA               6.0733     6.0858       100.20
Sparse              5.9703     0.9304        15.58
Linear              6.1939     4.0612        65.57
--------------------------------------------------
Median              6.0733     4.0612        65.57
Mean                6.0289     4.6190        76.86

============================================================
Asset: SuperFuture
Real training price (last value in training data): 3.8392
Method           TrainPred   Forecast   Return (%)
--------------------------------------------------
RNN                 3.4711     0.0000         0.00
DecisionTree        3.8392     6.1281       159.62
Naive               3.8392     3.8392       100.00
ExpSmoothing        3.6543     0.2507         6.86
ARIMA               3.6090     1.6457        45.60
Sparse              3.6170     0.0000         0.00
Linear              6.1167     8.4056       137.42
--------------------------------------------------
Median              3.6543     1.6457        45.60
Mean                4.0209     2.8956        64.21

============================================================
Asset: ABCDE
Real training price (last value in training data): 3.1961
Method           TrainPred   Forecast   Return (%)
--------------------------------------------------
RNN                 3.1237     2.7094        86.74
DecisionTree        3.1961     5.1690       161.73
Naive               3.1961     3.1961       100.00
ExpSmoothing        3.1663     2.2913        72.37
ARIMA               3.1499     3.0396        96.50
Sparse              3.3762     8.1929       242.66
Linear              3.0490     5.0219       164.71
--------------------------------------------------
Median              3.1663     3.1961       100.00
Mean                3.1796     4.2315       132.10
No description has been provided for this image
No description has been provided for this image

Markowitz Model using mean results¶

In [71]:
# Extract expected returns from mean results
expected_returns = np.array([results["mean"][asset]["predicted_return"] / 100.0 - 1 for asset in asset_names])
n_assets = len(asset_names)

# Compute covariance matrix using historical data
historical_returns = []
for i, asset in enumerate(asset_names):
    # Get prices during training period
    prices = np.array(asset_prices[i])
    times = np.array(asset_times[i])
    mask = (times >= 0) & (times <= 100)
    asset_prices_train = prices[mask]
    
    # Calculate returns
    asset_returns = np.diff(asset_prices_train) / asset_prices_train[:-1]
    historical_returns.append(asset_returns)

# Ensure all historical returns have the same length
min_length = min(len(returns) for returns in historical_returns)
historical_returns = [returns[:min_length] for returns in historical_returns]
historical_returns = np.array(historical_returns)

# Compute covariance matrix
cov_matrix = np.cov(historical_returns)

# Ensure covariance matrix is positive semi-definite
eigenvalues = np.linalg.eigvalsh(cov_matrix)
if np.any(eigenvalues < 0):
    cov_matrix += np.eye(n_assets) * 1e-8

print("Expected Returns (sample):")
for i in range(min(5, n_assets)):
    print(f"{asset_names[i]}: {expected_returns[i]:.4f}")

print("\nCovariance Matrix (sample):")
print(cov_matrix[:3, :3])

# Find minimum risk portfolio
def solve_min_risk():
    """Find the portfolio with minimum risk"""
    P = matrix(cov_matrix)
    q = matrix(np.zeros(n_assets))
    
    # Constraint: w ≥ 0
    G = matrix(-np.eye(n_assets))
    h = matrix(np.zeros(n_assets))
    
    # Constraint: sum(w) = 1
    A = matrix(np.ones((1, n_assets)))
    b = matrix(np.ones(1))
    
    sol = solvers.qp(P, q, G, h, A, b)
    weights = np.array(sol['x']).flatten()
    port_return = np.dot(weights, expected_returns)
    port_risk = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
    
    return weights, port_risk, port_return

# Find maximum return portfolio
def solve_max_return():
    """Find the portfolio with maximum return"""
    c = matrix(-expected_returns)  # Negative because we minimize -r^T w
    
    # Constraint: w ≥ 0
    G = matrix(-np.eye(n_assets))
    h = matrix(np.zeros(n_assets))
    
    # Constraint: sum(w) = 1
    A = matrix(np.ones((1, n_assets)))
    b = matrix(np.ones(1))
    
    sol = solvers.lp(c, G, h, A, b)
    weights = np.array(sol['x']).flatten()
    port_return = np.dot(weights, expected_returns)
    port_risk = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
    
    return weights, port_return, port_risk

# Find the extreme portfolios
min_risk_weights, min_risk_value, min_risk_return = solve_min_risk()
max_return_weights, max_return_value, max_return_risk = solve_max_return()

print("\nMinimum Risk Portfolio:")
print(f"Risk: {min_risk_value:.6f}")
print(f"Return: {min_risk_return:.6f}")

print("\nMaximum Return Portfolio:")
print(f"Return: {max_return_value:.6f}")
print(f"Risk: {max_return_risk:.6f}")
Expected Returns (sample):
SafeAndCare: -0.0336
Moneymakers: 0.1584
Fuel4: -0.1976
CPU-XYZ: 0.1936
MarsProject: 0.1217

Covariance Matrix (sample):
[[ 1.01854023e-04  6.69455358e-05 -1.71112855e-05]
 [ 6.69455358e-05  1.24950579e-03  1.45858732e-04]
 [-1.71112855e-05  1.45858732e-04  2.97224717e-04]]

Minimum Risk Portfolio:
Risk: 0.003115
Return: 0.064974

Maximum Return Portfolio:
Return: 0.326109
Risk: 0.009110

Implementation of the two multi-objective methods:¶

  • Weighted Sum Method (WSM)
  • Epsilon-Constraint Method (ECM)
In [72]:
# Suppress solver output
solvers.options['show_progress'] = False

# Weighted Sum Method (WSM)
def weighted_sum_method(n_weights=10, normalize=True):
    # Set up normalization ranges if needed
    if normalize:
        return_range = max_return_value - min_risk_return
        risk_range = max_return_risk - min_risk_value
    
    # Generate weight vectors
    weight_vectors = []
    for i in range(n_weights):
        w_return = i / (n_weights - 1)  # Weight for return
        w_risk = 1 - w_return           # Weight for risk
        weight_vectors.append((w_return, w_risk))
    
    # Solve for each weight vector
    solutions = []
    for w_return, w_risk in weight_vectors:
        # Setup quadratic programming parameters
        P = matrix(cov_matrix)
        q = matrix(np.zeros(n_assets))
        
        if normalize:
            # For normalized objective function
            q_mod = matrix(-w_return * expected_returns / return_range)
            P_mod = matrix(w_risk * cov_matrix / risk_range)
        else:
            # For non-normalized objective function
            q_mod = matrix(-w_return * expected_returns)
            P_mod = matrix(w_risk * cov_matrix)
        
        # Constraint: w ≥ 0
        G = matrix(-np.eye(n_assets))
        h = matrix(np.zeros(n_assets))
        
        # Constraint: sum(w) = 1
        A = matrix(np.ones((1, n_assets)))
        b = matrix(np.ones(1))
        
        # Solve QP
        sol = solvers.qp(P_mod, q_mod, G, h, A, b)
        
        if sol['status'] != 'optimal':
            print(f"Warning: Optimization did not reach an optimal solution for weights {w_return}, {w_risk}. Status: {sol['status']}")
            continue
        
        weights = np.array(sol['x']).flatten()
        port_return = np.dot(weights, expected_returns)
        port_risk = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
        
        solutions.append((weights, port_return, port_risk))
    
    # Filter duplicate solutions
    unique_solutions = []
    for sol in solutions:
        is_unique = True
        for existing_sol in unique_solutions:
            if (np.abs(sol[1] - existing_sol[1]) < 1e-6 and 
                np.abs(sol[2] - existing_sol[2]) < 1e-6):
                is_unique = False
                break
        if is_unique:
            unique_solutions.append(sol)
    
    return unique_solutions

# Epsilon-Constraint Method (ECM)
def epsilon_constraint_method(n_thresholds=10):
    """
    Implements the Epsilon-Constraint Method for portfolio optimization.
    
    Args:
        n_thresholds: Number of threshold values to use
    
    Returns:
        List of Pareto optimal solutions (weights, return, risk)
    """
    # Generate threshold values
    thresholds = []
    for i in range(n_thresholds):
        t = min_risk_return + (i / (n_thresholds - 1)) * (max_return_value - min_risk_return)
        thresholds.append(t)
    
    # Solve for each threshold
    solutions = []
    for threshold in thresholds:
        # Setup quadratic programming parameters
        P = matrix(cov_matrix)
        q = matrix(np.zeros(n_assets))
        
        # Constraint: w ≥ 0
        G = matrix(-np.eye(n_assets))
        h = matrix(np.zeros(n_assets))
        
        # Constraints: sum(w) = 1 and r^T w >= threshold
        A = matrix(np.vstack((
            np.ones(n_assets),     # sum(w) = 1
            expected_returns       # r^T w >= threshold
        )))
        b = matrix(np.array([1.0, threshold]))
        
        # Solve QP
        sol = solvers.qp(P, q, G, h, A, b)
        
        if sol['status'] != 'optimal':
            print(f"Warning: Optimization did not reach an optimal solution for threshold {threshold}. Status: {sol['status']}")
            continue
        
        weights = np.array(sol['x']).flatten()
        port_return = np.dot(weights, expected_returns)
        port_risk = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
        
        solutions.append((weights, port_return, port_risk))
    
    # Filter duplicate solutions
    unique_solutions = []
    for sol in solutions:
        is_unique = True
        for existing_sol in unique_solutions:
            if (np.abs(sol[1] - existing_sol[1]) < 1e-6 and 
                np.abs(sol[2] - existing_sol[2]) < 1e-6):
                is_unique = False
                break
        if is_unique:
            unique_solutions.append(sol)
    
    return unique_solutions

# Function to plot Pareto front
def plot_pareto_front(solutions, title):
    risks = [sol[2] for sol in solutions]
    returns = [sol[1] for sol in solutions]
    
    plt.figure(figsize=(10, 6))
    plt.scatter(risks, returns, c='red', marker='o')
    plt.xlabel('Risk')
    plt.ylabel('Expected Return')
    plt.title(title)
    plt.grid(True)
    plt.show()

# Test the methods with a small number of points
wsm_solutions = weighted_sum_method(10)
ecm_solutions = epsilon_constraint_method(10)

print(f"WSM found {len(wsm_solutions)} unique solutions")
print(f"ECM found {len(ecm_solutions)} unique solutions")

plot_pareto_front(wsm_solutions, "WSM Pareto Front")
plot_pareto_front(ecm_solutions, "ECM Pareto Front")
WSM found 6 unique solutions
ECM found 10 unique solutions
No description has been provided for this image
No description has been provided for this image

Experimentation:¶

  • Running WSM and ECM with different numbers of sample points.
  • Plottting the resulting Pareto fronts.
  • Comparing how many unique solutions each approach yields.
In [76]:
# Run WSM and ECM with different numbers of weight vectors/thresholds
point_counts = [10, 20, 100, 1000]
wsm_results = {}
ecm_results = {}

# Define different markers for each point count
markers = {
    10: 'o',    # circle
    20: 's',    # square
    100: 'd',   # diamond
    1000: '*'   # star
}

# Colors for each point count
colors = {
    10: '#1f77b4',    # blue
    20: '#ff7f0e',    # orange
    100: '#d62728',   # red
    1000: '#9467bd'   # purple
}

# Suppress solver output
solvers.options['show_progress'] = False

for count in point_counts:
    # Run WSM with normalization
    wsm_solutions = weighted_sum_method(count, normalize=True)
    wsm_results[count] = {
        'solutions': wsm_solutions,
        'unique_count': len(wsm_solutions)
    }
    
    # Run ECM
    ecm_solutions = epsilon_constraint_method(count)
    ecm_results[count] = {
        'solutions': ecm_solutions,
        'unique_count': len(ecm_solutions)
    }

# Bar chart for number of unique solutions
plt.figure(figsize=(12, 6))
x = np.arange(len(point_counts))
width = 0.35

wsm_counts = [wsm_results[c]['unique_count'] for c in point_counts]
ecm_counts = [ecm_results[c]['unique_count'] for c in point_counts]

plt.bar(x - width/2, wsm_counts, width, label='WSM', color='royalblue')
plt.bar(x + width/2, ecm_counts, width, label='ECM', color='tomato')

plt.xlabel('Number of Points', fontsize=14)
plt.ylabel('Number of Unique Solutions', fontsize=14)
plt.title('Comparison of Methods: Unique Solutions Found', fontsize=16)
plt.xticks(x, point_counts, fontsize=12)
plt.yticks(fontsize=12)
plt.legend(fontsize=14)
plt.grid(True, alpha=0.3)

# Add value labels on top of each bar
for i, v in enumerate(wsm_counts):
    plt.text(i - width/2, v + 0.5, str(v), ha='center', fontsize=12)
for i, v in enumerate(ecm_counts):
    plt.text(i + width/2, v + 0.5, str(v), ha='center', fontsize=12)

plt.tight_layout()
plt.show()

# Create separate plots for each point count
for count in point_counts:
    plt.figure(figsize=(10, 6))
    
    wsm_solutions = wsm_results[count]['solutions']
    ecm_solutions = ecm_results[count]['solutions']
    
    wsm_risks = [sol[2] for sol in wsm_solutions]
    wsm_returns = [sol[1] for sol in wsm_solutions]
    
    ecm_risks = [sol[2] for sol in ecm_solutions]
    ecm_returns = [sol[1] for sol in ecm_solutions]
    
    # Calculate marker sampling for both methods
    wsm_n = max(1, len(wsm_risks) // min(50, len(wsm_risks)))
    ecm_n = max(1, len(ecm_risks) // min(50, len(ecm_risks)))
    
    plt.scatter(wsm_risks[::wsm_n], wsm_returns[::wsm_n], marker='o', s=100, 
                label=f'WSM (showing {len(wsm_risks[::wsm_n])} of {len(wsm_risks)} points)', 
                color='royalblue', alpha=0.8, edgecolors='navy')
    plt.scatter(ecm_risks[::ecm_n], ecm_returns[::ecm_n], marker='x', s=100, 
                label=f'ECM (showing {len(ecm_risks[::ecm_n])} of {len(ecm_risks)} points)', 
                color='tomato', alpha=0.8)
    
    # Add connecting lines to show the fronts more clearly
    wsm_points = sorted(zip(wsm_risks, wsm_returns), key=lambda x: x[0])
    ecm_points = sorted(zip(ecm_risks, ecm_returns), key=lambda x: x[0])
    
    if wsm_points:
        wsm_x, wsm_y = zip(*wsm_points)
        plt.plot(wsm_x, wsm_y, '-', color='royalblue', alpha=0.7, linewidth=2.0)
    
    if ecm_points:
        ecm_x, ecm_y = zip(*ecm_points)
        plt.plot(ecm_x, ecm_y, '-', color='tomato', alpha=0.7, linewidth=2.0)
    
    plt.xlabel('Risk', fontsize=14)
    plt.ylabel('Expected Return', fontsize=14)
    plt.title(f'Pareto Front Comparison with {count} Points', fontsize=16)
    plt.grid(True, alpha=0.3)
    plt.legend(fontsize=12, loc='best')
    
    plt.tight_layout()
    plt.show()

# Combined plot with WSM only
plt.figure(figsize=(10, 6))

for count in point_counts:
    solutions = wsm_results[count]['solutions']
    risks = [sol[2] for sol in solutions]
    returns = [sol[1] for sol in solutions]
    
    # Connect points for better visualization
    points = sorted(zip(risks, returns), key=lambda x: x[0])
    if points:
        x_vals, y_vals = zip(*points)
        
        # Use larger, more distinct markers and plot more of them
        n = max(1, len(x_vals) // min(60, len(x_vals)))
        plt.scatter(x_vals[::n], y_vals[::n], s=150, 
                   label=f'WSM ({count} points, {len(x_vals)} solutions)', 
                   color=colors[count], marker=markers[count], alpha=0.9, 
                   edgecolors='black', linewidth=1.5)
        
        # Use different line styles for different point counts
        plt.plot(x_vals, y_vals, color=colors[count], 
                alpha=0.7, linewidth=2.5)

plt.xlabel('Risk', fontsize=14)
plt.ylabel('Expected Return', fontsize=14)
plt.title('WSM Pareto Fronts Comparison (simplified)', fontsize=16)
plt.grid(True, alpha=0.3)
plt.legend(fontsize=13, loc='lower right')
plt.tight_layout()
plt.show()

# Combined plot with ECM only - 10 vs 1000 comparison
plt.figure(figsize=(10, 6))

# Only compare 10 and 1000 points
comparison_counts = [10, 1000]

for count in comparison_counts:
    solutions = ecm_results[count]['solutions']
    risks = [sol[2] for sol in solutions]
    returns = [sol[1] for sol in solutions]
    
    # Connect points for better visualization
    points = sorted(zip(risks, returns), key=lambda x: x[0])
    if points:
        x_vals, y_vals = zip(*points)
        
        # Use larger, more distinct markers and plot more of them
        n = max(1, len(x_vals) // min(40, len(x_vals)))
        plt.scatter(x_vals[::n], y_vals[::n], s=150, 
                   label=f'ECM ({count} points, {len(x_vals)} solutions)', 
                   color=colors[count], marker=markers[count], alpha=0.9, 
                   edgecolors='black', linewidth=1.5)
        
        # Use different line styles for different point counts
        plt.plot(x_vals, y_vals, color=colors[count], 
                alpha=0.7, linewidth=2.5)

plt.xlabel('Risk', fontsize=14)
plt.ylabel('Expected Return', fontsize=14)
plt.title('ECM Pareto Fronts Comparison (simplified)', fontsize=16)
plt.grid(True, alpha=0.3)
plt.legend(fontsize=13, loc='lower right')
plt.tight_layout()
plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Conclusions¶

The Epsilon-Constraint Method (ECM) generally finds more unique Pareto optimal solutions compared to the Weighted Sum Method (WSM), even when WSM uses normalization. This confirms the theoretical advantages of ECM, particularly its ability to find solutions in concave regions of the Pareto front, which WSM may miss.

Key observations from the visualizations:

  1. ECM produces a much more complete and well-distributed Pareto front
  2. WSM tends to cluster solutions at the extremes, with gaps in between
  3. As the number of points increases, ECM continues to find new unique solutions along the entire Pareto front while WSM finds fewer new solutions
  4. Even with 200 points, WSM still cannot adequately capture the concave regions

Portfolio Game¶

In [78]:
# Define the asset names in the correct order
asset_names_ordered = [
    "SuperFuture", "Apples", "WorldNow", "Electronics123", "Photons",
    "SpaceNow", "PearPear", "PositiveCorrelation", "BetterTechnology", "ABCDE",
    "EnviroLike", "Moneymakers", "Fuel4", "MarsProject", "CPU-XYZ",
    "RoboticsX", "Lasers", "WaterForce", "SafeAndCare", "BetterTomorrow"
]

# Combine solutions from both methods for a more comprehensive Pareto front
wsm_solutions = wsm_results[1000]['solutions']
ecm_solutions = ecm_results[1000]['solutions']
all_solutions = wsm_solutions + ecm_solutions

# Analyze the risk and return characteristics of all portfolios
risks = [sol[2] for sol in all_solutions]
returns = [sol[1] for sol in all_solutions]

# Calculate return threshold (median return)
return_50th = np.percentile(returns, 50)
print(f"Return threshold (median): {return_50th:.6f}")

# Find MinRisk portfolio with above-median return
selected_idx = -1
min_risk_value = float('inf')

for i, (weights, port_return, port_risk) in enumerate(all_solutions):
    if port_return > return_50th and port_risk < min_risk_value:
        min_risk_value = port_risk
        selected_idx = i

# Get the selected portfolio
selected_weights, selected_return, selected_risk = all_solutions[selected_idx]
selected_source = "WSM" if selected_idx < len(wsm_solutions) else "ECM"
selected_sharpe = selected_return / selected_risk if selected_risk > 0 else 0

print(f"\nSelected MinRisk Portfolio (from {selected_source}):")
print(f"Expected Return: {selected_return:.6f}")
print(f"Expected Risk: {selected_risk:.6f}")
print(f"Sharpe Ratio: {selected_sharpe:.6f}")
print(f"Strategy: Minimum risk with above-median return")

# Map weights to the correct order
portfolio_weights = {}
for i, weight in enumerate(selected_weights):
    portfolio_weights[asset_names[i]] = weight

# Create ordered weights list
ordered_weights = []
for asset in asset_names_ordered:
    if asset in portfolio_weights:
        ordered_weights.append(portfolio_weights[asset])
    else:
        ordered_weights.append(0.0)

# Assert weights sum to 1
sum_weights = sum(ordered_weights)
assert abs(sum_weights - 1.0) < 1e-10, f"Portfolio weights must sum to 1.0. Current sum: {sum_weights}"

print(f"Sum of weights: {sum(ordered_weights)}")

# Plot the Pareto fronts with the selected portfolio
plt.figure(figsize=(10, 6))

# Sample solutions for clearer visualization
wsm_sample_rate = max(1, len(wsm_solutions) // min(50, len(wsm_solutions)))
wsm_sampled = wsm_solutions[::wsm_sample_rate]
ecm_sample_rate = max(1, len(ecm_solutions) // min(50, len(ecm_solutions)))
ecm_sampled = ecm_solutions[::ecm_sample_rate]

# Plot each method's front with improved visibility
plt.scatter([sol[2] for sol in wsm_sampled], [sol[1] for sol in wsm_sampled], 
           c='blue', s=40, alpha=0.6, label='WSM')
plt.scatter([sol[2] for sol in ecm_sampled], [sol[1] for sol in ecm_sampled], 
           c='green', s=40, alpha=0.6, label=f'ECM')

# Draw lines connecting the points to better visualize the Pareto fronts
wsm_points = sorted([(sol[2], sol[1]) for sol in wsm_solutions], key=lambda x: x[0])
ecm_points = sorted([(sol[2], sol[1]) for sol in ecm_solutions], key=lambda x: x[0])
if wsm_points:
    wsm_x, wsm_y = zip(*wsm_points)
    plt.plot(wsm_x, wsm_y, '-', color='blue', alpha=0.3)
if ecm_points:
    ecm_x, ecm_y = zip(*ecm_points)
    plt.plot(ecm_x, ecm_y, '-', color='green', alpha=0.3)

# Highlight the selected portfolio
plt.scatter([selected_risk], [selected_return], c='red', s=200, marker='*', 
            edgecolors='black', linewidth=1.5, label='MinRisk Portfolio')

# Add a horizontal line at the median return
plt.axhline(y=return_50th, color='grey', linestyle='--', alpha=0.7)
plt.text(min(risks), return_50th+0.01, 'Median Return', fontsize=10)

# Add annotation for the selected portfolio
plt.annotate(f"Return: {selected_return:.4f}\nRisk: {selected_risk:.4f}",
             xy=(selected_risk, selected_return), 
             xytext=(selected_risk+0.0008, selected_return-0.05),
             arrowprops=dict(facecolor='black', shrink=0.05),
             bbox=dict(boxstyle="round", fc="white", alpha=0.9))

plt.xlabel('Risk', fontsize=12)
plt.ylabel('Expected Return', fontsize=12)
plt.title('MinRisk Portfolio Selection', fontsize=14)
plt.grid(alpha=0.3)
plt.legend(fontsize=10)
plt.tight_layout()
plt.show()

# Create a bar chart of the portfolio weights
plt.figure(figsize=(12, 6))

# Convert to currency
selected_return = 100 + selected_return * 100

# Get non-zero weights and sort by weight
asset_weight_pairs = [(asset, weight) for asset, weight in zip(asset_names_ordered, ordered_weights) 
                     if weight > 0.001]
asset_weight_pairs.sort(key=lambda x: x[1], reverse=True)
sorted_assets = [pair[0] for pair in asset_weight_pairs]
sorted_weights = [pair[1] for pair in asset_weight_pairs]

# Create bar chart
bars = plt.bar(sorted_assets, sorted_weights, color='skyblue', edgecolor='blue')

# Add weight values on bars
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height + 0.005,
             f'{height:.2%}', ha='center', va='bottom')

plt.xlabel('Assets', fontsize=12)
plt.ylabel('Weight', fontsize=12)
plt.title('MinRisk Portfolio Asset Allocation', fontsize=14)
plt.xticks(rotation=45, ha='right')
plt.grid(alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

# Save the portfolio to a file
output_filename = "151958_151960.txt"
with open(output_filename, 'w') as f:
    f.write(f"{selected_return:.2f} {selected_risk:.6f}")
    for weight in ordered_weights:
        f.write(f" {weight:.6f}")

print(f"Portfolio saved to {output_filename}")
Return threshold (median): 0.212925

Selected MinRisk Portfolio (from ECM):
Expected Return: 0.213186
Expected Risk: 0.004536
Sharpe Ratio: 46.993663
Strategy: Minimum risk with above-median return
Sum of weights: 1.0
No description has been provided for this image
No description has been provided for this image
Portfolio saved to 151958_151960.txt

Our Selected Portfolio¶

We chose a portfolio optimization strategy that balances risk minimization with reasonable returns.

  • Expected Return: 121.32 (21.32% expected growth)
  • Risk Level: 0.0045 (very low volatility)
  • Position on Pareto Front: We identified the crucial "elbow" point where the risk-return trade-off appears most favorable
  • Method Used: We found the ECM (Epsilon-Constraint Method) produced better results for our specific criteria

Our Asset Allocation¶

Our analysis led us to a focused investment strategy:

  • Photons (52.15%): We chose this as our primary investment due to its excellent risk-adjusted returns
  • SafeAndCare (15.97%): We allocated a significant portion here for stability
  • WorldNow (11.02%): We included this for additional growth potential
  • Diversified holdings: We spread smaller investments across 9 additional assets to mitigate specific asset risks